Rejection Sampling for Weighted Jaccard Similarity Revisited

نویسندگان

چکیده

Efficiently computing the weighted Jaccard similarity has become an active research topic in machine learning and theory. For sparse data, standard technique is based on consistent weighed sampling (CWS). dense however, methods rejection (RS) can be much more efficient. Nevertheless, existing RS are still slow for practical purposes. In this paper, we propose to improve by a strategy, which call efficient (ERS), ``early stopping + densification''. We analyze statistical property of ERS provide experimental results compare with other algorithms hashing Jaccard. The demonstrate that significantly improves estimating relatively data.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Consistent Weighted Sampling Revisited

Min-Hash is a popular technique for efficiently estimating the Jaccard similarity of binary sets. Consistent Weighted Sampling (CWS) generalizes the Min-Hash scheme to sketch weighted sets and has drawn increasing interest from the community. Due to its constant-time complexity independent of the values of the weights, Improved CWS (ICWS) is considered as the state-of-the-art CWS algorithm. In ...

متن کامل

Unilateral Jaccard Similarity Coefficient

Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Jaccard Similarity Coefficient (uJaccard), which doesn’t only take into consideration the space among two points b...

متن کامل

PrivMin: Differentially Private MinHash for Jaccard Similarity Computation

In many industrial applications of big data, the Jaccard Similarity Computation has been widely used to measure the distance between two profiles or sets respectively owned by two users. Yet, one semi-honest user with unpredictable knowledge may also deduce the private or sensitive information (e.g., the existence of a single element in the original sets) of the other user via the shared simila...

متن کامل

SuperMinHash - A New Minwise Hashing Algorithm for Jaccard Similarity Estimation

Œis paper presents a new algorithm for calculating hash signatures of sets which can be directly used for Jaccard similarity estimation. Œe new approach is an improvement over the MinHash algorithm, because it has a beŠer runtime behavior and the resulting signatures allow a more precise estimation of the Jaccard index.

متن کامل

Forecasting Model Based on Neutrosophic Logical Relationship and Jaccard Similarity

The daily fluctuation trends of a stock market are illustrated by three statuses: up, equal, and down. These can be represented by a neutrosophic set which consists of three functions—truth-membership, indeterminacy-membership, and falsity-membership. In this paper, we propose a novel forecasting model based on neutrosophic set theory and the fuzzy logical relationships between the status of hi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2021

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v35i5.16543